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ABSTRACT 


The Defense Language Institute Foreign Language Center (DLIFLC) trains 
students in over 21 foreign languages for the Department of Defense (DoD). The 
National Security Agency (NSA) and Defense Intelligence Agency (DIA) are responsible 
for setting the training objectives for students entering professional fields in intelligence. 

Th the past, general proficiency in listening, reading, and speaking skills has been 
the focus of language learning and testing in the DoD. Certain minimum scores on the 
Defense Language Proficiency Test (DLPT) are required for certain training and 
operational positions within the DoD. DoD has not established applicable performance 
objective scores for training and operational positions. Individual service commanders at 
DLIFLC may exercise some discretion in borderline cases where general minimum DLPT 
requirements have not been met. They may take into account performance objective 
scores and grant waivers for attending Goodfellow Air Force Base (GAFB) follow-on 
training. 

The purpose of this study is to determine how the performance objective scores _ 
relate to success on the DLPT and how the combination of DLPT and performance 
objective tests might possibly relate to success on follow-on training at GAFB. Success 
at GAFB is defined by on-time graduation, number of required special-assistance hours, 


and. performance on “block” tests. 
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EXECUTIVE SUMMARY 
The Defense Language Institute Foreign Language Center (DLIFLC) trains 


students in over 21 foreign languages for the Department of Defense (DoD). The 
National Security Agency (NSA) and Defense Intelligence Agency (DIA) are responsible 
for setting the training objectives for students entering professional fields in intelligence. 

In the past, general proficiency in listening, reading, and speaking skills has been 
the focus of language learning and testing in the DoD. Certain minimum scores on the 
Defense Language Proficiency Test (DLPT) are required for certain training and 
operational positions within the DoD. 

DoD has not established applicable performance objective test scores for training 
and operational positions. Individual service commanders at DLIFLC may exercise some 
discretion in Raaienine cases where general minimum DLPT requirements have not been 
met. They may take into account performance objective scores and grant waivers for 
attending Goodfellow Air Force Base (GAFB) follow-on training. 

The aims of the study were to determine how the performance objective scores 
relate to success on the DLPT and how a combination of DLPT and performance 
spicctive tests might possibly relate to success on Aaah training at GAFB. In part, 
we seek “cut-off” scores on performance objective tests that will correlate to success on 
DLPTs and at GAFB. Success at GAFB is defined by on-time graduation, number of 
required special-assistance hours, and performance on “block tests.” 

In the first phase of the study, we used stepwise multiple linear regression to 


_create a model, which showed which performance objectives correlated best to the DLPT 


xi 





score for each language. Once the models were produced, we looked for consistency in 
the correlation of performance objectives and the DLPT amongst all the languages, then 
by the category of language difficulty, and finally by category of alphabet type (either 
Roman or non-Roman). 

We then determined cut-off scores for the performance objectives for each 
language that had one performance objective correlating to the DLPT. We calculated the 
cut-off score assuming a Normal probability distribution for DLPT scores, with — 
determined by the performance objective score. The cutoff was the performance 
objective score that gave an 80 percent chance of passing the DLPT. 

For the models that had two performance objectives correlating to the DLPT, we 
created a graph that given one performance objective score determines what the student 
needs to achieve on the second performance objective to have an 80 percent chance of | 
passing the DLPT. A passing grade on the DLPT was a score of 40 for DLPT_L 
(listening) and DLPT_R (reading), and 20 for DLPT_S (speaking). 

Additionally, we conducted an evaluation of the quality of the models. We looked 
at how well the models described the variation of the DLPT and whether or not there was 
a negative correlation between the performance objectives and the DLPT. The negative 
correlation of a performance objective and the DLPT does not make “good” sense by 
itself, because it states that students scoring score higher on a performance objective are 
expected to score lower on the DLPT. The belief is that there is a more complicated 


explanation that could be explained by interactions between performance objectives, but 
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since we did not allow interactions in these models, some models show a negative 
correlation. 

In the second phase of this study, stepwise multiple linear regression was used to 
determine the correlation of performance objectives and DLPT scores with “block” tests 
at GAFB for each service. In this phase, attention was restricted to the Russian language. 
We looked for consistency in the performance objectives and the DLPT to determine if 
there was one objective that best determined success at GAFB. 

In the first phase, the study found that in some languages the performance 
objectives were better ‘cilia of success on the DLPT than other languages. Polish 
and Japanese were languages where the performance objectives were “good” predictors 
for performance on the DLPT. Vietnamese was a language where the performance 
objectives were “poor” predictors for performance on the DLPT. 

There are ten performance objective tests. Numbers | through 4 are intended to 
measure listening skills; numbers 5 through 8 are aimed at measuring reading skills; and 
numbers 9 and 10 measure speaking. We found that, across all languages, performance 
| objectives 1, 3, and 7 appeared most frequently as predictors of success on the DLPT_L. 
Performance objectives 2, 5, and 7 were the best predictors for success on the DLPT_R. 

' And finally, performance objective 1 was the most frequent predictor for success on the 
DLPT_S. 
These results are slightly different when the languages are divided by categories of 


difficulty (I to IV, I being easiest) and by alphabet (Roman and non-Roman), but the 
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general conclusion remains valid: the performance objective tests do not seem to measure 
what they were designed to measure. Furthermore, different performance objective tests 
appear as the “best” predictors of DLPT tests scores in different languages. For example, 
proficiency objective 9 was the best predictor for DLPT_L in Czech, while proficiency 
objective 7 was the best predictor for DLPT_L in Hebrew. 

For the GAFB, again some of the proficiency tests were better predictors of 
success than others. The best predictors of success on the “block” tests are different for 
the three courses (Army, Navy/Marine Corps and Air Force). 


The study shows that the performance objectives are not measuring the listening, 


reading, and speaking skills intended, nor do they seem to measure the same things in 
different languages. We recommend that DLIFLC review and validate their performance 
objectives. If cut-off scores for performance objectives need to be assigned, DLIFLC can 


assign them utilizing the findings within this thesis. 
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I. INTRODUCTION 


A. BACKGROUND 

The Defense Language Institute Foreign Language Center (DLIFLC) trains 
students in 21 foreign languages for the Department of Defense (DoD). The National 
Security Agency (NSA) and Defense Intelligence Agency (DIA) are responsible for 
setting the training objectives for students entering professional fields in intelligence. 

In the early 1990s these two communities developed specific training objectives 
for students entering the basic language program. These objectives were written and 
refined over a period of several years with the assistance of numerous experienced 
personnel in the various fields that the students were preparing to enter. With NSA and 
DIA concurrence, DLIFLC combined the requirements from both communities into a 
single set of program objectives for all students. These objectives are referred to as Final 
Learning Objectives (FLO). 

There are four types of FLOs: proficiency objectives, which include the general 
language skills of reading, listening and speaking; performance objectives, which focus 
on job-specific skills that involve foreign language use such as transcribing, summarizing 
text, translating, etc.; content objectives, which include background knowledge of the 
target country related to interpretation of foreign language materials --such as knowledge 
in the area of politics, military topics, culture, geography and technology; and enabling 
objectives, which incorporate knowledge of colloquial language, dictionary usage, 


number drills, and future transliteration system. Test instruments and test data are 





available for measuring only the first two kinds of FLOs, proficiency objectives and 
performance objectives. This study will be concerned only with data on these two FLOs. 

DLIFLC measures attainment of proficiency FLOs through the Defense Language 
Proficiency Tests (DLPT) and the performance objectives through ten performance 
objective tests. Since 1958 various formats and scoring systems have been used in 
different versions of the DLPT to measure general language proficiency. The current 
DLPT consists of two multiple-choice tests and an interview. The multiple-choice tests 
measure proficiency in listening and reading and the interview measures proficiency in 
speaking. 

Instruction in the performance objectives was introduced in 1987 and test batteries 
for 13 languages were deyaiopea and fully implemented by 1994. For each language, 
there is a series of ten performance objectives test. These tests are task-oriented, 
constructed-response tests, as opposed to multiple-choice tests. For example, examinees 
are asked to produce an English summary of a conversation, transcribe text in the target 
language, read legible native handwriting, translate transcribed materials, etc. 

DLIFLC has three major types of students: cryptologists, human intelligence | 
personnel and Foreign Area Officers. Approximately 70 percent of the students are 
cryptologists. The majority of cryptology students attend a follow-on school at 
Goodfellow Air Force Base (GAFB) in San Angelo, Texas, where they receive job- 
specific training involving foreign language skills. The cryptology students attending 
GAFB are drawn from all four uniformed services. Of the twenty-one languages taught at 
DLIFLC, the ten highest enrollment languages have a follow-on component at GAFB. 


Graduates of the other twelve languages, which account for approximately 30 percent of 

















the enrollees, do not go to follow-on training at GAFB. Because job requirements can 
vary for the different services, in some languages GAFB offers different courses for 
members of the different services. Each GAFB course consists of a series of “blocks” of 
instruction reflecting training objectives for that course. GAFB evaluates its training 
within these courses with tests based on these blocks, some of which are multiple choice 
and some of which are of the constructed-response type. 
B. PROBLEM 

In the past, general proficiency in listening, reading, and speaking skills has been 
the focus of language learning and testing in the DoD. A general rule applicable for all 
services is that cryptology students with a minimum acceptable DLPT score (measuring 
general proficiency) are eligible to attend follow-on training at GAFB. 

DoD does not have a corresponding rule establishing minimum acceptable 
performance objective scores for entry into GAFB. Individual service commanders at 
DLIFLC may exercise some discretion in borderline cases where general minimum DLPT 


requirements have not been met. They may take into account a variety of factors, such as 


Motivation, military bearing and performance objective scores, to grant waivers for 


attending GAEFB follow-on training. 

Some GAFB “block” tests are similar to performance objectives tests in format 
and language skills addressed; for this reason the DLIFLC Evaluation Division believes 
the performance objectives test scores can be an extremely important factor in 
determining the probability of success in follow-on training and ultimately the field. 

The purpose of this study is to determine how the performance objectives test 


scores relate to success on the DLPT and how combinations of DLPT and performance 








objective tests might relate to success in follow-on training at GAFB. For the purpose of 
this study, success at GAFB is defined by on-time graduation, number of required 
mandatory study hours, and performance on “block”’ tests. 

The results of this study will assist Service Commanders in interpreting the 
meaning of performance objectives tests when making decisions about waivers for 
admission to GAFB follow-on training. The results may also be of interest to language 
departments and service commanders in making decisions about recycling students prior 
to graduation. Recycling means returning a borderline student to an earlier point in the 
course in a trailing class in order to give the student time to work on academic 
weaknesses. The results of this study might also help interpret the meaning of tests given 
prior to graduation that are similar to either the DLPTs or performance objectives in 
format and content. 

C. ORGANIZATION OF THESIS 

Chapter [I contains a review of the literature on prediction of success at DLIFLC. 
Chapter III describes the data and variables considered. Chapter IV outlines a description 
of the method used to analyze the data. Chapter V contains the findings of the analysis. 
Chapter VI contains a discussion on the summary, conclusions and recommendations. 
The statistical package used in this thesis is named SPSS (Ref. 10). The Appendices 
present an example of the SPSS output, graphs that show predicted values on tests to 


achieve a predetermined probability of passing designated DLPTs or “block” tests, and 


the S-plus code used to create the graphs. 

















Il. LITERATURE REVIEW 


While there is a large literature on the learning of language in civilian schools, the 
military has gone largely un-analyzed. The issue of predicting language learning success 
has been analyzed in a few other studies. However, a formal study dedicated to 
correlation of performance objectives and proficiency FLOs with follow-on training 
measures has not been performed, nor has a formal study been conducted on the 
correlation of performance objectives and proficiency FLOs within each language. The 
following are brief descriptions of the previous research conducted on predicting 
language learning success completed at DLIFLC. 

A. LANGUAGE SKILL CHANGE PROJECT 

The Army Research Institute for the Behavioral and Social Sciences and the 
DLIFLC conducted a joint research effort to determine the effectiveness and efficiency 
with which foreign language skills are learned, retained, and applied to job 
responsibilities in the Army. The specific objectives of the study were to 1) track changes 
in language proficiency over time, 2) identify factors related to changes in proficiency, 
| aa 3) better understand predictors of language learning at DLIFLC. The Language Skill 
Change Project (LSCP) (Ref.3) was a longitudinal study that followed approximately 
2000 Army linguists throughout their foreign language training and in their first tour of 
duty in the field. Data were collected from the linguists at seven different times starting 
from the first week of their language eaininy at DLIFLC and extending until 


approximately three years after their graduation from DLIFLC. 





Report Il of LSCP, entitled “The Prediction of Language Learning Success at 
DLIFLC,” (Ref. 6) indicated that success can be predicted by non-cognitive measures. 
The findings support the continuation and expansion of linguist select procedures based 
on cognitive ability for admission to DLIFLC training. Of all the types of student 
characteristics considered in this research, the measures of the different cognitive 
aptitudes had the greatest success as predictors of performance. In developing improved 
selection procedures, however, some consideration should be given to the possibility of 
incorporating at least some non-cognitive attributes as well. Specifically student 
attitudes, motivation and inatiea learning strategies made significant contributions to the 
prediction of listening and reading skills. Motivation, provided relatively important 
prediction increments to the less predictable speaking skill. Report II of LSCP, 
“Training Approaches for Reducing Student Attrition From Foreign Language Training,” 


(Ref. 5) showed that in the samples studied, a Defense Language Aptitude Battery 


(DLAB) score of 100 was pivotal in determining trends for attrition. Students with scores 


of 100 or below were more likely to attrit than those students with scores above 100. 
| B. OTHER DLIFLC RESEARCH 

1. “Language Choice and Performance.” 

The Research and Analysis Division (ESR) of the DLIFLC was tasked to 
investigate whether the level of proficiency attained by students in the Basic course has a 
relationship to whether or not the language assigned was their language of choice. The 


study (Ref. 4) was conducted on a sample of Fiscal Year (FY) 1990-1994 graduates of 


the DLIFLC Basic course in eight languages. This study indicated that there was minimal 


correlation between ability to choose which language to study and subsequent 





performance in the language studied; thus, other factors should be chosen to explain 
training outcomes. 

2. “The Effects of Length of Service and Prior Language Study at DLI 
on DLPT Attainment.” 

This study (Ref. 7) was conducted by ESR to compare the DLPT performance of 
enlisted military personnel who had four or more years of service to that of initial entry 
trainees (IET), who had less than one year of service before enrolling in DLIFLC Basic 
Language Course. Additionally, the study covered those who had studied a language at 
DLIFLC prior to their current enrollment to those who had not. This study showed no 
significant difference in performance between IETs and those personnel with more than 
four years of service. The eaults do, however, strongly support the use of previous 
foreign language study as a useful predictor of subsequent language learning eucenes 
Aptitude measures had statistically significant correlation with proficiency in all three 
skills. 

So: “Relationships of Language Aptitude and Age to DLPT Results 
. among Senior Officer Students in DLIFLC Basic Language Courses.” 

ESR conducted this study (Ref. 8) pursuant to the request from the DLIFLC 
Command Group to examine the relationships of age and aptitude among all basic course 
students in paygrades O5 and O6. The results were that correlation of age with DLPT 


measures of listening, reading and speaking were not statistically significant. 














DT. THE DATA 


Personal and career statistics of students who have attended DLIFLC and GAFB 
are maintained in a database at DLIFLC. The data for this study were obtained from this 
database. 

A. THE POPULATION 

The majority of the training at DLIFLC is conducted in the basic acquisition 
epukes of language instruction. The Basic course is largely composed of enlisted 
military students who have one or fewer years of military service. 

In the first phase of the study, we examine the relationships between performance 
objectives in various languages and proficiency DLPTs for all students graduating from 
DLIFLC between the beginning of FY96 and the end of FY97. This data set includes . 
records for 5413 students. 

In the second part of the study, we consider both proficiency and performance 
FLOs as predictors of measures of success in follow-on training at GAFB for a subsample 
of the original population. This subsample includes only students of Russian. The 
dependent variables in this subsample were different for students in each Service, because 
GAFB has different courses with different criterion measures for the Army, Air Force, 
and Navy/Marine services. This overall Russian subsample included 516 records. 

B. THE VARIABLES 
1. First Portion of Study: Dependent Variables 
The dependent variables for the first portion of this study were the scores obtained 


on the DLPT. The DLPT is used as the standard for successful completion of the initial 











course of language instruction. There are three scores on the DLPT for each language: 
the first is for listening, the second is for reading, and the third is for speaking. 
The DLPT speaking, listening, and reading scores are reported on a scale with 
eleven points; each point is called a “level score.” Within the U.S. Government and 
DoD, speaking, listening, and reading scores are reported on a scale with eleven possible 
levels. The possible level scores are 0, 0+, 1, 1+, 2, 2+, 3, 3+, 4, 44, and 5. Levels 3+, 4, 
4+, and 5 in listening and reading are not awarded at DLIFLC for reading and listening, 
however the full range of score may be awarded for DLPT in speaking. The scale of level 
scores indicates levels of proficiency for military linguists as defined by verbal 
descriptions approved by the Federal Interagency Language Roundtable. There is a 
general rule applicable to all Services that students with at least Level 2 in Listening, 
Level 2 in Reading, and Level 1 in Speaking are eligible to attend follow-on training at 
GAFB. Level 2 in reading is described as sufficient comprehension to read simple, | 
authentic written material in a form equivalent to usual printing or typescript on subjects | 


within a familiar context. A Level 2 student will therefore be able to read texts that are 


normally presented outside of a classroom environment, for example a newspaper : ! 
clipping or business letter. A Level 2 listening score is defined as sufficient 

comprehension to understand conversations on routine social demands and limited job 

requirements (e.g., be able to understand face-to-face speech in a standard dialect, 

delivered at a normal rate by a native speaker not used to dealing with foreigners, about 


everyday topics). The speaking Level | is defined as the ability to satisfy minimum 


courtesy requirements and maintain very simple face-to-face conversations on familiar 


topics. For example, this speaker would be able to ask for help and verify comprehension 








of a native speaker, but misunderstandings would be frequent. The DLPT speaking score 
is obtained directly from an interview conducted by trained and certified language testers. 
The DLPT in listening and reading yield converted scores of 0 to 60, which yield level 
scores ranging from 0 to 3. For this analysis, the converted scores was used for the 
reading and listening tests. 

Z First Portion of Study: Independent variables 

The independent variables used in the first portion of the study for each language 
sample were the ten performance objectives test scores. (While other variables might 
have been considered, the shinai as stated by DLIFLC-ESR is to make predictions 
based on scores on these tests.) The possible scores on each of the ten performance 
objectives range from 0 to 100. Table 1 is a description of the performance objective 


categories: 


Table 1. Description of Performance Objective Categories 


broadcast or conversation. 
broadcast or conversation. 
Transcribe decontextualized numbers. 


FSA Reading Answer content questions about a level 2 
written text. 












Reading Read reasonably legible hand- written 
native text. 


o « ° ° 


Translation Translate level 2 text into idiomatic 


English. 


Biographical data interview. 
FIOA Two-way interpretation. 


Translation Translate an English text into level 2 
target language. 
S 








3. Second Portion of Study: Dependent Variables 

The dependent variables for the second portion of this study were the scores 
obtained on the GAFB “block” tests in the respective Russian courses for the various 
services, the total time to train at GAFB, and the number of hours required in the Special 
Individual Assistance (SIA) program. 

The “block” test scores are obtained from a variety of different tests. Some of 
these tests yield a pass/fail score while others have a score that ranges from zero to 


one hundred. 


Special Individual Assistance is a program developed for those students who are 
having difficulty in the course of instruction. GAFB mandates special hours of additional 
help in the areas in which these students are having difficulty. 

4. - Second Portion of Saidy: Independent Variables 

The independent variables for the second portion of the study include both 
performance objectives and DLPTs as discussed above, but only for the Russian 


subsample. 











IV. METHODOLOGY 


A. REGRESSION ANALYSIS MODEL 

Regression analysis models allow the forecaster to estimate the value of one 
variable based on its relationship to one or more other variables. Simple regression 
assumes that the functional relationship between two variables can be represented as a 
straight line. Each of the n observations is assumed to obey: 


Y¥,;=Bot+ PiXiteg,i=1,...,n | (1) 


where Y; is the i value of the dependent variable, X; denotes the corresponding value of 
the independent variable, Bo is the point at which the straight line intersects the Y-axis, B; 
is the regression coefficient or slope of the line, and g; is the “error” which describes the 
departure of this observation from the line. Simple regression uses the ordinary least 
squares (OLS) method to find the equation for a straight line which most closely 
approximates the underlying data set (Ref. 2, pp. 30-33). Multiple neeaien is identical 
to the simple regression model except that the model uses multiple (say, k—1) predictors 
(X’s) for each data point. The least squares method then fits a plane rather than a straight 
line: 

Y; = Bot BrXi +B2Xp + ...... + BX +& i=1,...,7 (2a) 
or, in matrix notation, 

Y=XB+e, | (2b) 

where Y is an n-vector of observations of the dependent variable; X (n x k) is the matrix 


of observations of independent variables (here including a column of 1’s for the 


13 





intercept), B is the k-vector of regression coefficients (here including the intercept, Bo), 
and & is the n-vector of “errors.” (Ref 2, p. 66). 
B. THE GENERAL MODEL 

Ordinary least squares multiple linear regression analysis is used to fit the model 
of each dependent variable to the data available. The least-squares principle specifies that 
the b;’s (estimated coefficients) are to be chosen so as to minimize the sum of squared 
differences between the observed values and the estimated values of the dependent 


variable. This quantity is known as the sum of squared residuals (RSS). 


RSS = >} (¥,-Y,)? (3) 
f= ] 
or in matrix terms: 


RSS = (¥ — XB)'(Y-XB) (4) 

where the superscript ““T” denotes transposition. | 

We estimate the vector B (the true coefficients), by the solution, b, to the 
following equation (Ref 2, p. 72): 

b = (X'X) '(X°Y) | , > © 

C. THE STEPWISE REGRESSION MODEL 

Stepwise regression is an automatic method of building a multiple linear 
regression model to select the set of independent variables for inclusion. This procedure 
can be described ‘ a step-up procedure with a step-down adjustment. First, starting with 
no X variables in the model, the computer program chooses the variable that has the 


largest simple correlation with Y. Thereafter, it either adds the X variable that produces 


the largest further increase in R* or removes the variable that will least reduce R? (see 








= ee .——”—.— C—O 


section D). At each step the p-value for the usual F-test is computed. The procedure 
stops when a specified significance level, .05 for forward selection and .1 for backward 
elimination, cannot be met by any further inclusion or exclusion of a variable (Ref 1, p. 
123). This selection procedure does not guarantee optimum subsets, but it does overcome 
some of the major deficiencies encountered in other methods and is the best method 
offered by SPSS. 
D. THER’ STATISTIC 

A commonly accepted statistic for measuring the value of a regression equation is 
the R? statistic. The R’ statistic measures the proportion of total variation about the mean 
which is accounted for by the regression, equation (6). This statistic should be viewed 
with some caution, because it can be made arbitrarily high by adding additional variables; 


nonetheless it is widely used and so we report it here. 


R? =e (6) 


where Y. is the i predicted value, Y is the mean of the dependent variable, and Yj is the 


i" actual value (Ref 2, p. 39). 
E. THE t-TESTS AND F-TEST 
The OLS yields estimates (b;) for our regression coefficients 8B; . The estimated 


standard error, o , of the regression is 


G =| (7) 








where k is the number of estimated parameters and n is the number of data points (Ref 2, 
p. 36). 
Assuming that errors are independently and identically distributed as N(0,0’), the 


Statistic 





(8) 


where SE, , the standard error for the estimated coefficent bj , is the j" diagonal element 
i 


of the estimated covariance matrix of the parameters, 
6°(KIX), (9) 
follows a Student’s t-distribution with n—k degrees of freedom (where n is the number of 
data points) under the null hypothesis 
Ho: Bj = 0. (10) 
The p-value is the estimated probability of epianiae results as extreme as the 
sample or more extreme when the data is drawn from a population in which Hp is true. A 
low p-value indicates that it is unlikely that such a sample would come from a population 
where Hp is true; therefore we can reject the null nyeotliegs and state that it is likely that 
there is a linear relationship between the dependent and independent variables. The 
critical value used to reject the null hypothesis in this study is .05. Therefore any p-value 
obtained less than .05 is said to be “statistically significant.” 
An F-statistic can test hypotheses regarding sets of parameters. The null 


hypothesis for this test is, 


Ho: Bi =B.= eens = B,=0 (11) 
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Using the same philosophy as with the t-statistic, we reject the null hypothesis if the p- 
value is less than .05: 


>a -Y) 


df, 
F= yay - 12 


df; 


where dfp and dfg are the degrees of freedom for the regression and the error respectively 
(Ref 2, pp.43-45). We will discuss the assumptions and limitations that we used in the 
model in Sections F and G. 

The use of a regression model to analyze a set of data is subject to a number of 
assumptions and limitations (Ref. 2, pp. 110-112). 
F. ASSUMPTIONS 

1. Fixed X 

In this study, the X values are not fixed as part of the design. Therefore we 
proceed with the anlysis conditional on the X’s we actually observe. 

2 Errors are normally distributed with a mean of zero 

This means that over the long run, sample setae: (b;,) will center on the true 
parameter value (R;). A probability plot and histogram of residuals are observed to verify 
that errors are Normally distributed. These plots are produced as a matter of course by 
the SPSS software; see the example in Appendix A. In general, the assumption of 
Normality seems to be approximately correct. The assumption that the mean of the errors 
is zero cannot be tested, since the residuals always have mean 0; however, the 


consequences of a non-zero mean are limited to a bias in the intercept (Bo) term. 





3. Homoscedasticity (errors have constant variance) 

The third assumption is that the variance of the regression errors is constant. The 
variance of these errors, also known as residuals, must remain constant over the entire 
range of values for the independent variable. Variables with non-constant variances can 
give significance tests that are meaningless. To verify that homoscedasticity exists, 
thereby validating the assumption of constant variance, a residual versus predicted values 
plot is observed. (See Appendix A for an example.) The plot should show a random 
pattern, and this assumption generally appears valid. 

4. Errors are uncorrelated with each other (no autocorrelation) 

The fourth assumption we used is that the errors are independent of one another. 
This assumption should be safe because the observations are not collected at points 
adjacent in time or space. Interestingly, the usual Durbin WaKsen test showed occasional 
departures from this assumption, but given the nature of the data it is difficult to explain 
serial correlation. We proceed as if this assumption were correct. 

G. LIMITATIONS 
ie Omitted Variables 

If other variables affect both X and Y, b; may substantially overstate or understate 
the true relationship between X and Y. Of course we cannot identify these vanables 

pe Nonconstant Variance of Errors (Heteroscedasticity) 

If the variance of the errors were to vary with the level of X, the usual standard 
errors, hypothesis tests, and confidence intervals would not be trustworthy. In small 
samples it can be difficult to assess the residual versus predicted plots. The assumption 


of homoscedasticity does seem to hold in the large-sample cases. 
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3. Nonlinear Relationships 

OLS finds the best-fitting straight line. This can be misleading if the expected 
value of Yj; is a nonlinear function of X. A pattern in the plot of residuals versus fitted 
values (see Appendix A) would be evidence of a violation of this assumption, but such a 
pattern was not seen. 

4. Non-Normal Errors | 

The usual ¢ and F procedures assume that the residuals are Normally distributed. 
This assumption seemed to hold up in the large-sample cases and is difficult to assess in 
smaller samples. When errors are non-Normal, p-values from these procedures are 
untrustworthy. 

5. Influential Cases 

OLS can be affected by outliers, which can pull the line up or down and 
substantially influence all results. This was examined primarily in the unusual cases 
where performance objective coefficients were negative. There was no evidence of 
recording errors in the data. 
-H. DETERMINATION OF PROBABILITY SCORES 

1. Single Main Effect Models 

_ The determination of performance objectives cut-off scores for the models with 

one main effect was conducted using the assumption that the errors in the model are 


Normal, thereby ensuring that the predicted DLPT scores are also Normal. For any 


specific (row) vector of independent variables Xo, the model predicts the value Y, = Xob. 








The standard error of this prediction is given by SE( Y) =6(1+Xp (XIX) Xo ae 
(Ref.2, p. 79). 
The distribution of the DLPT for a specific performance objective score is then: 
N(Xob , SE(Y,)* ) (13) 
and the quantity (Yo — Xob )/SE(Y, ) should follow the Standard Normal. We seek the 


performance objective score for which we predict an 80% chance of reaching a pre- 


determined cut-off (a passing score) on the DLPT. Thus we have 


nt — X,b ,, cut - off x) _ 1-0 - off — X,b 


x = = | = 0.8 (14) 
SE(Y,) SE(Y,) SE(Y,) 


from which we get 
Xob = cut-off — SE(Y,) x © 1.2). | (15) 

In a model with only one independent variable, we can then find the performance 
objective score for which the predicted probability of a passing score (40 for DLPT_R or 
DLPT_L, 20 for DLPT_S) is 80%. In fact, we draw a graph of performance objective 
__ Score (X) against predicted probability of passing for every X, for every model with only 
one main effect, and plot them in Appendices B (for DLIFLC) and C (for GAFB). These 
plots were constructed by the software package S-Plus (Ref. 9) 

For example on the DLPT_L for Czech, see Figure (1). A score of 93 or greater 


on performance objective 9 needs to be obtained to have an 80 percent chance of reaching 


| 
| 
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a score of 40 or greater on the DLPT_L. 





Czech/L 


80% Cutoff: 93 


Prob. of scoring above 40 





0 20 40 60 80 100 
Perf. Obj. 9 


Figure 1. Probability of Scoring 40 or Greater on DLPT_L/Czech Given F9A 


This graph has the sort of shape we expect: a student who scores poorly on test | 
F9A is predicted to have little chance of passing the DLPT_L and a student who does 
very well is predicted to have a high chance of passing. Some of these graphs have a less 
intuitively-appealing shape, however. For example, it appears that most students pass the 
DLPT_L in Tagalog, regardless of their scores on the “best” predictor, test FIA. On the 
other hand, even a student who scores very well on test F7A does not have a predicted | 
probability of 80% of passing the Korean DLPT_S. See Appendix B. 

2. Models with Two Main Effects 

A similar analysis can be done in a model with two main effects. In this case, Xo 
contains two performance objectives and there will be an infinite number of combinations 


of scores for which the predicted DLPT score yields an 80% chance of passing. We can 
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plot the “frontier” made up of all such combinations for every model with two main 
effects. The S-plus code for developing this frontier graph is in Sanaa EB. 

For example as shown in Figure (2), for the GAFB Navy Russian students, some 
of the combinations of scores on F8A and F7A for which the predicted probability of 
scoring a 70 or greater on “block” test 27 is 80% are: 

F8A = 60; and F7A = 7; 
| F8A = 40; and F7A = 16; and 


F8A = 5; and F7A = 35. 


80% Frontier forRussian/Air Force, Block 27 


¢ {90 "ad 
0g OV 09 08 oot 


0 


Perf. Obj. 1 


FSA and F7A 


Of course, any combination of scores whose position on the graph is above and to 
the right of the line leads to a predicted probability greater than 80%. Some interesting 
features can be seen on these graphs (see Appendices E and F). For example, in a number 


of the DLPT cases the frontier is very near the right-hand corner of the graph, showing 


| 
| 
| 
| 
Figure 2. Eighty Percent Probability of Scoring 70 on Block Test 27/Navy Given 
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that very few combinations of scores yield a predicted probability of passing as high as 
80%. Conversely, at GAFB it is often the case that every student passes, so that the 
frontier coincides with the co-ordinate axes. (In those cases no picture is supplied.) As 
discussed in section V.A, it sometimes happens that the regression coefficients are 
negative. The effect of this on the frontier graph can be seen in, for example, Czech on 
the DLPT_R. The frontier has a positive slope, indicating that students with higher 
scores on F1A need higher scores on FSA to reach a predicted 80% probability of passing 
the DLPT_R than students with lower scores on FIA. This result is clearly counter- 


intuitive. 
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Vi FINDINGS 


The first criterion for selecting a model was that the F-statistic comparing the null 
model to the model with a single term be significant at the 5% level, which indicates that 
the model 1s better than simply using the mean of the dependent variable. Originally we 
considered models with interactions and models growing out of factor analysis... 
However, DLIFLC found these to be un-interpretable. Furthermore the decreases in 
standard error obtained with these models, compared to models with only main effects 
were minimal. Thus every model had only main effects for the independent variables. 

Once a single-term model had been chosen, our second criterion came into play. That 
was that in our judgment, a decrease in standard error of less than 0.1 did not warrant the 
addition of another term to the model, even if that term was “statistically significant” by 
the regression F-test. Such a term was deemed to be of no practical significance. 

Starting with a one-term model, then, terms were added one at a time until adding a term 
caused an improvement in standard error less than 0.1. For example, SPSS produced 
seven possible models (all with significant F-statistics) for the Arabic DLPT_L model, 

| one each with one main effect, we main effects, a so on up to seven aii effects. The 
model with one main effect had a standard error of 4.17, the model with two main effects 
3.74, the model with three main effects 3.57, and the model with four main effects 3.51. 
Since the difference 1n standard error for the model with one main effect (4.17— 3.74 = 
0.43) was greater than 0.1, we then considered the model with two main effects. A 
similar subtraction comparing the standard errors for models of size two and three (3.74 — 


3.57 = 0.17) also gave a result greater than 0.1. For the third model, the difference (3.57 
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— 3.51 = 0.06) was less than 0.1; therefore this model was chosen. The order of the - 
variables within the models is the order in which the stepwise regression entered the 
variables. For example, in the Arabic model for DLPT_L, F2A was the first variable to 


enter the model, then F7A, and lastly FIA. See output in Appendix A. 


A. LISTENING PROFICIENCY MODELS BY LANGUAGE 

A summary of the DLPT_L models by language are shown in Table 2. In that table, 
“STD ERR” denotes the standard error of the regression, while “STD DEV” gives the 
standard deviation of the responses. (This, of course, is also the standard error from the 
naive model that includes only an intercept.) ““ABC S/D” indicates whether alphabets are 
similar to our own (that is, Roman) or different from it; “LAN CAT” gives the language’s 
category of difficulty. | 

Originally we thought that the performance objectives scores that would measure 

the listening proficiency were FIA, F2A, and possibly F3A and F4A. The variable which 
occurred most frequently across all languages for the DLPT_L were F1A, F3A, and F7A 
--as shown in Table 3. These variables appear to be best predictors of performance on the 
DLPT_L. Furthermore, among languages using similar alphabets (the Roman alphabet), 
F1A and F3A appeared to be the best indicators, while for dissimilar alphabets F1A and 
F7A appeared to be better indicators as shown in Table 5. Additionally, F3A and F8A 
appeared to be the best predictors in Category I languages. There is only one language in 
Category Il, so there was no analysis done for this category. In Category II languages, 
F1A and F7A appeared to be the best predictors, and in Category IV languages FIA and 


F3A were the best predictors. See Table 4. It is interesting to note that in some cases, 
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one or more of the regression coefficients is negative. This indicates (counter to our 
expectations) that an increase in the performance objective score is associated with a 
decrease in the predicted DLPT score. The reason for this result may be that there really 
are interactions between independent variables that our model does not include. Our 
standard errors of prediction are generally about as small as in models that include 
interactions, however. (See also section V.E.2.) Additionally, we note that while the 
performance objective scores are all on the same scale, the estimated coefficients can vary 
by a factor of about one-thousand (ranging from about 0.4 to about 0.0008). In each case, 
though, the addition of a term reduces the standard error of the regression model by at 


least 0.1. 


2) 





Table 2. Summary of Models for DLPT_L 

LANGUAGE | EQUATION STD |N R° STD | ABC | LAN 
ERR DEV |S/D | CAT 

Arabic 27.708 + .116* F2A | 3.57 |712 | 568 | 5.42 IV 
“A” + 8.754*10° *F7A 

+ .477*10°7 *F1A 
a rR 5710 eee 

7 S 


Mandarin —| * F6A. + 6.597* 10 
is Gis * FIA 
sere PP 
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French 4.755 + .288 * F3A 
"F" + .193 *F8A 
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Hebrew 22.940 + .265* F7A 
ae 
Japanese ~.242+.458* F5A |2.44 | 23 784 15.11 
ny + .179 * F4A + .156 

* F7A — 321* FlA 
Italian 11.898 + .363 * F8A 


ee 


Korean 33.405 +.102* FIA |2.78 |427 | 342 [3.46 
"k” + 5.281*10°7 * F3A 


Persian-Farsi | 34.492 +.109* F2A | 3.05 | 223 
"p” + 6.024*10°7 * FSA : 
+ 5.081*10°7 * F3A 


~18.950 + .232 * 2.95 
‘Ee FIA +.719 * FI0A 
Spanish 30.549 + .131* F1A [3.98 [778 |. 
"SY + 5.371*107 * F3A 

+ 7.304*10°? * F7A | 

Russian 36.76+.126*FlA [3.52 |594 |.476 [4.91 
Tagalog 41.530 + 7.945*10~ | 2.10 359 | 2.54 
a 
Vietnamese 19.902 + .221* F9A | 3.86 
ey +9.917*107 * FIA 
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Table 3. Frequency of Variables in the DLPT_L Model 





Table 4. By Category of Difficulty 
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Table 5. By Category of Alphabet 
1] 
1] 





B. READING PROFICIENCY MODELS BY LANGUAGE 

Originally we thought thai the performance objectives scores that would measure 
the reading proficiency were F5A, F6A, and possibly F7A and F8A. The summary of the 
DLPT_R models by language is show in Table 6. 

The variable which occurred most frequently for the DLPT_R were F2A, F5A, 
and F7A as shown in Table 7. These variables appear to be the best predictors of 
performance. Among the languages using similar alphabets (the Roman alphabet), F8A 
appeared to be the best indicator, while for aissiniiian alphabets F2A, F5A and F7A 
appeared to be better indicators as shown in Table 9. Additionally, F8A appeared to be 
the best for Category I languages. In Category II languages, F2A, F5A and F7A 
appeared to the best predictors, and in Category IV languages F5A and F7A were the best 


predictors. See Table 8. 
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Table 6. Summary of wes for DLPT_R 


LANGUAGE | EQUATION sk Tow [i [esr 
35.689 + .101* F7A + ai fale 452 ee Pe 
cA? 7.817*1077 * FSA 


Chinese- 36.562 + .124* F6A+ | 3.50 ere 


tr 
oe 













Mandarin 8.666*10 7 * F7A 
“C” 


Czech 41.239 + .120* FSA — 
a bi 5.130*10°7* FIA 
French —~5.585 + .248 * FIOA 
"RB" 
17.491 + .210* F6A + 
+ .103* F7A 
Japanese 
id 
F7A 
Korean 26.558 + .136* FSA + 
pe 6.303*10 7 * FIOA + 
6.923*107* FIA 
464 |.419 | 224 |6.01 
7.049*10°7 * F6A 
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137 *F2A ~— .122 *F4A 
~16.160 + .207 * F8A 
ah fags 
Persian-Farsi | 33.812 + 9.762*107* 
Polish ~33.449 + .357* F3A | 2.99 | .663 | 13 
git + .580 * FIOA 
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-.| Spanish 37.917 + 8.466*10°* | 3. 343 | 776 
1S? F2A + .110*F5A 
Russian 31.942 +7.164*107* | 3.31 | .526 | 582 | 4.93 
"R" F2A + 6.323*107 
*FT7A + 8.230*10°7* 
F5A + 7.872*10° 
*FLA 
Tagalog 34.599 + .169* F8A+ | 1.66 |.789 | 17 | 3.26 
"G" 123 * F2A — 


9.843*10 7 *F4A 


27.504 + .305 * F2A 70. | 327 12.53 
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Table 7. Frequency of Variables in the DLPT_R MODEL 
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Table 8. By Category of Difficulty 


Category 4 (4) 


Frequency | Percent 
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Table 9. By Category of Alphabet 
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Cc, SPEAKING PROFICIENCY MODELS BY LANGUAGE 

Originally we thought that the performance objectives scores that would measure 
the speaking proficiency were F9A and FIOA. The DLPT_S models are summarized by 
language in Table 10. The variables, which occurred most frequently for the DLPT_S, 
were F1A, and F7A as shown in Table 11. These variables appear to the best predictors 
of performance. Among the languages using the similar alphabets (Roman alphabet), 
F1A and F8A appeared to be the best indicators, and for dissimilar alphabets F7A was the 
better indicator of performance as shown in Table 13. Additionally, Category I languages 
did not show a dominant performance objective as a predictor. In Category II languages, 
FIA was the best predictor, and in Category IV languages FIA, F7A and FIOA were the 


best predictors. See Table 12 below. 
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Table 10. Summary of Models for DLPT_S 


LANGUAGE | EQUATION STD |R° N |STD | ABC 
ce nae el ae 
Arabic 7.333 + 9.536*1072 * pepe yey Pe 
AS FIA + 8.269*10°7 * 

F1OA 
Chinese- 15.384 + 5.786*107 | 2.65 |.144 | 223 | 3.00 
epee PP 
“co | 
a ate F9A 
French ~1.1019 + .146 * 2.97 |.279 | 123 |3.48 |S I 
"F" F3A + .101 * F8A | 
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mee Pp 
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Table 11. Frequency of Variables in the DLPT_S Model 


VARIABLES | FREQUENCY | PERCENT | | 




























Table 12. By Category of Difficulty 

| Category 1 (3) 
a ae ee ee ee es ee eee 
POA) oe a 
RPA [  - [| - f[ - [| - | - Ut - 
| FSA | - [| - [ -1 [125 [| - | - | 
FOA | = | eT 
[FAA | - | - [| 2 [| 2 | 2 | 50 | 
es ee > 
eee ee ee ee ee ee 








Table 13. By Category of Alphabet 


| ——s| SIMILAR ALPHABET (6) | DISSIMILAR ALPHABET (9) 
[Frequency | _Percent__| Frequency | ___—Percent_—| 
: | 








D. PROBABILITY OF PASSING DLPT 

The statistical software, S-plus, was used to determine cut-off scores for those 
models with one and two main effects. In models with more than two main effects, cut- 
off scores on the performance objective tests can only be shown in three (or more) 
dimensions. These pictures are difficult to show and interpret. 

The cut-off scores were calculated by assuming Normal performance objective 
scores utilizing the model, and the standard error of the model. We calculated for each 
language the performance objective score for which we predicted an 80 percent 
probability of scoring a proficiency of level of two. A proficiency level of two is 
determined by a converted score of 40 or greater on the DLPT_L or DLPT_R, and a score 
of 20 or greater on the DLPT_S. The results are shown for models with one main effect 
in Table 14 for DLPT_L, Table 15 for DLPT_R, and Table 16 for DLPT_S. The seeillte 


for models with two main effects are shown in Appendix F. 


Table 14. Eighty Percent Chance of Scoring 40 or Greater on DLPT_L Given: 






TAGALOG 
THAI 
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Table 15. Eighty Percent Chance of Scoring 40 or Greater on DLPT_R Given: 





Table 16. Eighty Percent Chance of Scoring 40 or Greater on DLPT_S Given: 


CHINESE- 
MANDARIN 
CZECH _ 





E. QUALITY OF MODELS 

1. R? as a Quality Indicator 

Utilizing R’ as an indicator of a “sood” model, Figure 3, 4, and 5 show that some 
languages appear to produce better models than others. The letters in quotes of Tables 2, 
6, and 10 represent the language. Figures 3,4 and 5 indicate that Japanese (“J”) and © 
Polish (“L’”) have a high R? for both the DLPT_L and the DLPT_R models (R°2J” = 
664, R27 “J” = .784; R’p“L” = .663, R7,“L” =.771), but the R” is not very high in the 
DLPT_S (R’s“J” = .220, R’s“L” = .420). Tagalog (“G”) has a high R* for DLPT_R 
model, “isa a moderate R? for the DLPT_S model, but not a very high R? for DLPT_L 
model (R°p“G” = .789, R7.“G” =.359, R’s“G” =.507). Additionally, Vietnamese (“V”) 
has a low R’ for all three proficiency tests; DLPT_S, DLPT_L and DLPT_R (R’R“G” 


=,308, R7“G” = .248, R’s“G” =.267). It is not clear to us why different languages should 
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have different predictability from their respective performance objective tests. This is 


something that DLIFLC ought to investigate. 


Comparison of R2 


Reading 


Listening 





Figure 3. Comparison of R? For DLPT_L and DLPT_R. 
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Figure 5. 


Comparison of R2 


r 
G 
L 
Z 
R 
V F'> H A 
J 
kK c® 
0.2 0.4 0.6 0.8 
Listening 


Comparison of R? for DLPT_L and DLPT_S. 
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2. Models with Negative Main Effect Terms 

In the Czech model, we saw that the results are very different from what we 
expected. For example, the relationship of performance objective 1 with the DLPT R is 
“significantly” negative, when performance objective 5 is in the model. Possibly the 
performance objectives do not measure what they are supposed to, or we have seen a 
result of low probability. More likely, there is an interaction occurring amongst these 
variables, but when interactions are not allowed into the model, we get negative main 
effects. For example, Equation (19) would be the Czech model with interactions allowed 
(using the same criterion that allows in variables only if they reduce the standard error by 
more than 0.1): 

DLPT_R = 39.995 + 1.441*10° * F5F9 — 6.288*107* * F1F5 (19) 

It is certainly reasonable for an interaction to be negative. One ifrpretation iS 
that performance objective | is positively correlated with DLPT_R, and performance 
objective 5 is too, but performance objective 1 and performance objective 5 themselves 


have a highly positive correlation so that the effects when performance objective 1 and 


‘performance objective 5 are both high are not additive. Thus, someone who does “really” 


well on performance objective 1 and “really” well on performance Sbjechive 5 does better 
than someone who does well on performance objective 1 and well on performance 
objective 4, but the increase is not as much as one would expect. 
F. GOODFELLOW AIR FORCE BASE FOLLOW-ON TRAINING 

The same procedures that were used for DLPT models were used for the 


development of the GAFB models. One addition to the DLPT models was that the scores 
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of the DLPT were considered as an independent variable for the prediction of success at 
the follow-on training at GAFB. 

Each Service has a different course, each with different lengths and different tests. 
Therefore one cannot compare test scores for Army “Block 1” with Air Force “Block 1.” 
A number of the “block” tests produced Pass/Fail grades. 

The models in Tables 17, 19, and 21 were developed for those “block” tests which 
had variability in their scores, using SPSS. “NO MODEL” was placed in the “equation” 
column of the table for those “block” tests with no variability, such as for which every 
grade was “Pass.” The data size for each course also varied, Army being the largest with 
108 data points, Air Force with 35 data points and Navy/Marine Corps with 30 data 
points. 

The most frequent variables for the Russian ee were: for the Navy/Marine 
Corps F1A and F7A; for the Air Force F3A; and for the Army F5A, DLPT_S, and 


DLPT_R as shown in Tables 18, 20, and 22, respectively. 





Table 17. Russian Navy/Marine Corps Models 


BLOCK TEST | EQUATION STD STD 

io JNOMODEL i (—C<~ssi‘(<i(i‘i‘imrSCsi‘“‘(‘*SCNCSC(‘C 
20 JNOMODEL —“‘“‘~wTSSSCdCC 
(4 = |NOMODEL —“(‘“‘*wTSSSCOCd 
5 === |NOMODEL i itsti‘“‘“Cé*rLSCOCOC;*TS:COC*;‘;CNSCOC*é«s 
6 ~~ ~=|NOMODEL tt s—“‘imECCOCdT:C (iG 
7 62.8384.193*FBA—si«60—«*|=28s* 83 
8 ———s* 80.4384+.499*DLPTS «(3.09 ~—S«[ 3143.60 
9 82.2604.123* FRA ——=*'«S22—«* 58 
Wi 60.117 +.334* FA s«d Ss «606—s«d CDC” 
4.77 

16 == =|NOMODEL ee ti—(iésrSC“(<‘C ézsLCONOCOCé#*dNCOC#NC(#C” 
19 «| 82.854 4.434 * FIA-.721*DLPT_S [5.12 [504 [6.96 
214 
(2300S SC INOMODEL—“( w™~é<C~sEsti‘“‘CS*@LC(’’S*CBNCNC(C(CSN 
(240 SS |NOMODEL i (sts—“<é‘irTSC‘“##SCNNNNNSEPENNC 
250 ===|NOMODEL i tt (_wsCdTC“‘C*sTCO!;*é*SCLCOCOC(‘C#S 
26  ===~=|NOMODEL ee t—“(“‘“C™é*sLCO!O!OC*#*dCOC#*RSCtstsCi‘<CY 
[MSHS |NOMODEL ss t~—“‘<‘mLSC(*CN’NCNSCUNNNCNCSCiCn 
[ACTL. ====|NOMODEL es t—“‘“LCSOOC*dTCSCOC;(‘*CNC.OC*S 
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Table 18. Frequency of Variables in the Goodfellow Russian Navy/Marine 
Corps Model 
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Table 19. Russian Air Force Models 


EQUATION STD R? 
ERR 


BLOCK TEST STD 


DEV 


i 


[NOMODEL ee t—“‘iYSC“‘(‘$’NSU“CDNSNUTTNWNC_ 

2 = |NOMODEL —t—“‘(CSLCOOCOC*#*dNCOWC#ésV,C( ‘S‘STC¥SW 
3. {Nomoprr —“(‘“C‘C‘CrLOUOU€dLOUOC~+«dr~SOwté‘ ‘OM 
258 
5S |NOMODEL— —=C~sC“‘(‘#YTCCSCWLNWNSNNCCSEUC(‘CU(CON 
6 | 82.125 4.145 * 3A 
73.607 + .203 * F7A 143 
8 s« 65.111 4.182 * F3A +.174 * BSA 449 |.577 [669 | 
9 | 86.659 + 108 * F8A 
134 
13 |NOMODEL i (is*~*sésESC‘“(‘(Y’CNWSUCNN’NS N-—s—S 
15 |NOMODEL—“(‘“C™CSC;*LSCOC#d#‘S*C OC‘C$SNCOCi‘(SSNN 
65.967 + .246 * F3A 262 
17 62.946 +.203 *F3A + .212 * F2A 
aaa a 


83.645 + .156 * F3A 


li 

S) 

~ 
Ni} fai 
al | 60 
Al jn 
= 
we 


bo 


BO PRO TRO TR [ee 
—/ 10100 


iN 
we) 


51.437 + 1.074 *DLPT_R — .387 * 6.53 AS51 
FSA — .968 * DLPT_S 


230 |NOMODEL Cd CdCC~“‘sUNNOC#” 

24 CINOMODEL Cd CdC“‘“‘C*LUOOCé#éS 
25 CC [NOMODEL Cs “‘SSCd|SCOt‘“‘éC]!#SOOC#*#*S 
MSH NOMODEL TCT 
ACTL 489.371 — 5.871 * DLPT_S + 5.719 * | 39.39 301 
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Table 20. Frequency of Variables in the Goodfellow Russian Air Force Model 


VARIABLES | FREQUENCY | PERCENT 






PFOA | 
ane) ee a ee ee 
ae a a ee ee a 
ae <7 a i a er 
eS 7) a ee ee ee ee 
Po FSA | 
ee ee ee 


DLPT_R 
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Table 21. Russian Army Models 




















BLOCK TEST | EQUATION STD R? STD 
| 
2 ~~ ~={|NOMODEL t—“‘“‘ézwYSCS*‘*dEC(#S*N‘NUCTCNC 
3. = ==|NOMODEL st ttiti—‘“‘(Cé*dLCOO!OCOC*T:COC(#‘*”UNCOC*#éCY” 
6 72.915+.435*DLPTRs«S.77_——«*SC«wNWG:—C=C=«*C(CwH26HC=Cd 
7 = {|NOMODEL i ( sts—“‘“LSC*é‘“C(*sLSCOCO;*‘CSC#”C‘(C‘(‘( “ 
8 ~~ ={NomMopeEr tt tt~—“—é‘~wSC‘C(“<‘C‘*sLCSCOCé‘éCZCNCCOC#éCS” 
9 | 75.234 +657 * DLPT_S-.137* FIA [8.66 | .093 
ii JNOMODEL i (_sss—CSsSti‘“<‘(‘“ CiSsrO!;*;*dCOCOC#CNW#C 
120 ~~~={|NOMODEL i Cid: (‘tC 
13 20.085 + .619* DLPT_S + .302* F4A | 8.82 229 10.32 
75.844 + .194* F7A 
88.355 + 8.861*10- * F5A 061 | 4.17 
16 = = = =~=|NOMODEL i “(_w™™CwdC“(<‘ Cé‘é*srL.O!OC~*rOCt;*S™” 
[NOMODEL ee —“(‘(sC 


40.114 + .233 * F5A+.590* DLPT S| 5.67 309 oo | 


— .127 * F7A + .165 * F4A 


(19 ——s| 72.785 = .337* FIOA +.308*F9A | 4.04 






NO MODEL ae 
3.839 — .500 * FIOA + .467 * FOA 375 | 
NO MODEL a 


Table 22 Frequency of Variables in the Goodfellow Russian Army Model 


VARIABLES | FREQUENCY | PERCENT 
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G. PROBABILITY OF PASSING “BLOCK” TESTS 

As previously shown in the first part of the analysis for the DLPT, the statistical 
software, S-plus, was used to determine cut-off scores for those models with one and two 
main effects in the GAFB models. 

The cut-off scores were calculated by aesumie Normal performance objective 
scores and/or DLPT scores and utilizing the model and the estimate of the dan error 
of prediction, as before, to calculate the score required for an 80 percent probability of 
passing the “block” test. A passing score is 70 for the Navy/Marine Corps and Army, and 
80 for the Air Force. The results are shown for models with one main effect in Table 23 
for the Navy/Marine Corps, Table 24 for the Air Force, and Table 25 for the Army. (The 
corresponding graphs are shown in Appendix C.) The results for models with two fan 
effects (in the form of “frontier graphs’) are shown in Appendix G. 

In a number of block tests, the grades were numeric (that is, not “Pass/Fail’’) and 
yet every student passed. That leads to scores of zero in tables 23-25. The implication is 
that regardless of the score on the performance objective, the probability that a student © 
passes the “block” test is siedicied as 100%. This explains graphs like the one for the 
Navy and Marine Corps Block 9, for example. In those graphs the “80%” level is 


reported as NA or 0 (the latter when a score only barely higher than 0 is required). 
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Table 23. Scores Required to Produce an 80% Chance of Scoring 70 or Greater 
on Russian Navy/Marine Corps Block Tests: 

















SCORE 

a. 2 ee ee 
sot SNA 
9 | 8 NA 
[nn a ee ae 
ae a ee eae 








Table 24. ~ Scores Required to Produce an 80% Chance of Scoring 80 or Greater 
on Russian Air Force Block Tests: 
















BLOCK FXA/ 
DLPT_ 
x 
6 | 3 {| 0 | 
oOo | 8 | NA 
Ce a ee 
Cy ee ae ee 
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Table 25. Scores Required to Produce an 80% Chance of Scoring 70 or Greater 
on Russian Army Block Tests: 


BLOCK | FXA/DLPT_X | SCORE 
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VI. SUMMARY, CONCLUSIONS, AND RECOMMENDATIONS 


A. SUMMARY 

In this thesis, we sought to accurately depict how the performance objectives 
correlate with the DLPT and how well the combination of the DLPTs and performance 
objectives forecast future performance at Goodfellow Air Force Base follow-on training. 
The models for each DLPT and each “block” test were created utilizing multiple linear 
regression method. In Chapter J, the background of the DLIFLC and the tests that are 
required for the analysis were discussed. In Chapter II, the previous studies were 
summarized. Chapter DI gave a discussion of the population and the variables 
researched. In Chapter IV, the methodology for the model formulation was detailed. And 
finally, in Chapter V, the models and the statistics utilized to evaluate the accuracy of the 
models were summarized. 
B. CONCLUSIONS 

The primary research questions in this thesis are 1) What are accurate cut-off 
scores for the performance objectives and the DLPT to predict success at GAFB follow- 
on course? 2) How good are performance objectives for predicting future performance? 

In some languages the performance objectives were better predictors of success on 
the DLPT than in others. For example, for the Polish language in the DLPT_L and 
DLPT_R, the R? statistic was high in both models and in the DLPT_S model, the R? was 
moderately high. Thus, to the extent that the R” statistic is an accurate indicator of a 
“good” model, then the performance objectives test are an accurate predictor for the 


DLPTs for the Polish language. However, the R? in the Vietnamese language was 
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relatively low for all three DLPTs and therefore the performance objective test are not as 
accurate a predictor for the Vietnamese language. 

Overall, performance objective 1, performance objective 3, and performance 
objective 7 were the most frequent performance objectives used as predictors for the 
DLPT_L. Performance objective 2, performance objective 5, and performance objective 
7 were the most frequent performance objectives used as predictors for the DLPT_R. 
And finally, performance objective 1 was the most frequent performance objective test 
used as the predictor for success on the DLPT_S. 

However, when divided by category of difficulty, performance objective 1 was the 
most frequent predictor of success on the DLPT_L for the more difficult lan guages. For 
the DLPT_R, performance sbiective 8 was the best predictor for Category 1 languages, 
performance objective 2 for Category 3 languages and performance objective 5 uid 
performance objective 7 for the Category 4 languages. And finally, for the DLPT_S, 
performance objective 1 was the best predictor for the Category 3 languages and 
performance objective 7 for the Category 4 languages. 

When the languages were divided by type of alphabet, performance objective 1 
and performance objective 3 were the best predictors for the DLPT_L for similar 
(Roman) alphabets and performance objective 1 and performance objective 7 were the 
best predictors for the dissimilar (non-Roman) alphabets. For the DLPT_R, performance 
objective 8 was the best predictor for the similar alphabets and performance objective 2, 
performance objective 5, and performance objective 7 were the best predictors for 
dissimilar alphabets. For DLPT_S, performance objective 1 and performance objective 8 


were the best predictors for similar alphabets and performance objective 1, performance 
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objective 7 and performance objective 10 were the best predictors for the dissimilar 
alphabets. 

In each language, different performance objectives were better predictors of DLPT 
tests scores. For example, F9A was the best predictor for DLPT_L in Czech, where F7A 
was the best predictor for DLPT_L in Hebrew. These performance objective tests were 
designed to measure proficiency in either Listening, Reading or Speaking. It appears that 
the performance objective tests are not measuring shai they were intended for. 

For the GAFB, again some of the proficiency tests were better predictors of 
success than others. For the Navy/Marine Corps Russian course, performance objective 1 
and performance objective 7 were the best indicators of success for the “block” tests. 
Performance objective 3 was by far the best indicator for success for.the “block” tests for 
the Air Force Russian course. And finally, peiondsnes objective 5, and DLPT_R and 
DLPT_S were the best indicators for success for the Army Russian course. Additionally, 
the proficiency tests at DLIFLC were not good indicators for predicting the number of 
mandatory study hours (“MSH”) and the actual course length (“ACTL”) for the GAFB 
Russian courses. Lack of variability in the course length and number of mandatory study 
hours in the data available mainly caused this. 

C. RECOMMENDATIONS 

My recommendation is that DLIFLC review and validate the performance 
objective tests to ensure that the tests measure the intended proficiency skills. With the 
models developed within this thesis, DLIFLC can predict success on test scores but each 
language utilizes different performance objectives with different degrees of error for each 


model. 
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APPENDIX A. EXAMPLE SPSS OUTPUT 


LANG = Arabic 


Variables Entered/Removed a,b 


Model Prad Removed Method 


tepwise 
(Criteria: 
Probabilit 
y-of-F-to-e 
nter <= 
050, 
Probabilit 
y-of-F-to-r 
emove >= 
-100). 
Stepwise 
(Criteria: 
Probabilit 
y-Of-F-to-e 
nter <= 
050, 
Probabilit 
y-of-F-to-r 
emove >= 
100). 
Stepwise 
(Criteria: 
Probabilit 
y-of-F-to-e 
nter <= 
.050, 
Probabilit 
y-of-F-to-r 
emove >= 
-100). 
Stepwise 
(Criteria: 
Probabilit 
y-of-F-to-e 
nter <= 
.050, 
Probabilit 
y-of-F-to-r 
emove >= 
-100)}. 
Stepwise 
(Criteria: 
Probabilit 
y-of-F-to-e 
nter <= 
.050, 
Probabilit 
y-of-F-to-r 
emove >= 
.100). 
Stepwise 
(Criteria: 
Probabilit 
y-of-F-to-e 
nter <= 
.050, 
Probabilit 
y-of-F-to-r 
emove >= 
.100). 
Stepwise 
(Criteria: 
Probabilit 
y-of-F-to-e 
nter <= 
.O50, 
Probabilit 
y-Of-F-to-r 
emove >= 





































































a. Dependent Variable: DLPT_L 
b. LANG = Arabic 
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Model Summary"! 


Std. Error Change Statistics 
Adjusted of the R Square 
R Square | R Sc Estimate Change F Change df1 df2 Change 


.640# 410 E : 487.752 
.725° 526 171.802 
RY hoe 568 ‘ i 67.796 
.7639 583 25.750 
.772° 595 : ; 21.431 
776 602 , 12.075 
7789 .605 ; : 5.076 

a. Predictors: (Constant), F2A 

b. Predictors: (Constant), F2A, F7A 

©. Predictors: (Constant), F2A, F7A, F1A 

d. Predictors: (Constant), F2A, F7A, FIA, F5A 

©. Predictors: (Constant), F2A, F7A, FiA, F5A, F3A 

f. Predictors: (Constant), F2A, F7A, F1A, F5A, F3A, F10A 

9- Predictors: (Constant), F2A, F7A, F1A, F5A, F3A, F10A, F9A 

h. Dependent Variable: DLPT_L 

I. LANG = Arabic 
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Coefficients?” 


Standardi 
zed 
gehen as aes 
ae 


(Constant) a sh .677 ro 758 
F2A O11 22.085 oe ee Lae 


(Constant) ae . .000 

F2A .169 om 1 .448 : .000 .640 493 397 

F7A . .106 391 : ; : 
.308 k : 
322 : 
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mp ro-order - 






























(Constant) 27.708 







F2A .116 
F7A 8.754E-02 
FIA 9.447E-02 


(Constant) 
F2A 
F7A 
FiA 
F5A 
(Constant) 
F2A 
F7A 
FIA 
F5A 
F3A 
(Constant) 































23.859 










F2A 101 
F7A 4.648E-02 
FIA 7.704E-02 
F5A 4.593E-02 
F3A 2.856E-02 
FI0A 4.021E-02 


(Constant) 27.079 
F2A .104 
F7A 5.006E-02 
FIA 7.908E-02 
F5A 4.455E-02 
F3A 2.740E-02 
FIOA 3.847E-02 
FSA -3.69E-02 


a. Dependent Variabie: DLPT_L 
b. LANG = Arabic 


By 








Residuals Statistice> 


Fs Minimum | Maximum Mean 


Predicted 
Male 44.19 


Std. 
Deviation N 


712 

























Residual -8.86E-03 712 
Std. 

Predicted -.001 712 
Value 

Std. 





-.003 712 


Residual 





a. Dependent Variable: DLPT_L’° 
b. LANG = Arabic 


Charts 
Histogram 
Dependent Variable: DLPT_L 
LANG: AD Arabic 
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Dependent Variable: DLPT_L 
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DLPT_L 


Regression Standardized Predicted Value 


Scatterplot 
Dependent Variable: DLPT_L 


LANG: AD Arabic 
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APPENDIX B. PROBABILITY CHARTS FOR DLPT FOR ALL MODELS WITH 
ONE MAIN EFFECT 


A. DLPT_L 
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APPENDIX C. PROBABILITY CHARTS FOR “BLOCK” TESTS OF RUSSIAN 
GAFB MODELS WITH SINGLE MAIN EFFECTS 
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APPENDIX D. S-PLUS FUNCTION FOR PROBABILITY GRAPHS 


function(lang, first, dlpt, data = big, crit = 40, prob = 0.8, 
return.model = F, n = 20) 


jt3: Do two-d "prob. of passing | FLO") plot 


Arguments: lang: two-letter language abbreviation 
first: name of first FLO test 
dltp: one-letter choice of dlpt 
Start by trying to handle zeros 
zeros <- data[, first] == # 


HE He OE OE OE OE HE 


Stick these things into frame 1. Don't ask. 


+e Hk 3 


assign("lang", lang, frame = 1) 
assign("zeros", zeros, frame = 1) 7 


Create the text of the model statement, and execute it. 


+e +e 4k 


model.txt <- paste("Im(DLPT.", dlpt, " ~ ", first, 
" data = data, na.action = na.omit, subset = LANG == lang & 

'zeros)", sep = "") 

out <- eval(parse(text = model.txt)) # 
# 
# Set up a vector of FLOs in result[,1]. For each element in the 
vector, find the predicted : 
# DLPT score and the associated SE of prediction. Then compute the 
probability of 
# passing the test. 


# 
result <- matrix(0, n, 2) 
result[, 1] <- seq(0, 100, length = n) 
preds <- predict(out, cbind(1, result{, 1]), se.fit = T) 
sds <- sgqrt(predsSresid*2 + predsS$se%*2) 
result[, 2] <- 1 - pnorm( (crit - preds$fit)/sds) # 
# 
# Extract FLO number (10 is a special case) for the label. 
“4 : | } | 
if(nchar(first) == 4) 
fx <- substring(first, 2, 3) 
else fx <- substring(first, 2, 2) # 
plot(result[, 1], result[, 2), ylim = c(0, 1), type = "1", xlab = 
paste("Perf. Ob j.", £x), ylab = 
paste("Prob. of scoring above", crit), main = 
paste(xref[xref[, "“two"] == lang, "long"], "/", 
dipt, sep = "")) # 
i é 
# ...then compute and display the cut-off itself. 
i# 
app <- approx(result[, 2], result[, 1], 0.8) 
text(20, 0.8, paste("80% Cutoff:", round(appSy) )) 
return(result) 
J 


ds) 
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APPENDIX E. S-PLUS FUNCTION FOR FRONTIER GRAPHS 


function(lang, first, second, dlpt, data = big, crit = 40, prob = 0.8, 
return.model = F, n = 20) 


Prepare.grid: prepare a grid for making a cool 3D plot. 


{ 
+ 
# 
# 
# Arguments: lang: two-letter language abbreviation 

# first: name of first FLO test 

# second: name of second FLO test 

= dlpt: one-letter choice of dlpt (L, R, or S) 

# crit: cut-off value of interest 

# prob: probability of exceeding "crit" on "dlpt" 

# return.model: If True, return model: useful for debugging 
2 n: Number of points at which to compute prob. 
= 
# 


Start by trying to handle zeros 

zeros <- data[, first] == | data[, second] == 0 # 
# 
# Stick these things into frame 1. This gets around a well-known bug 
# in Splus in which modelling functions cannot find objects in local 
frames. 
# 

assign("lang", lang, frame = 1) 

assign("zeros", zeros, frame = 1) # 
# 
# Create the text of the model statement, and execute it. Save it in 
" Oe ” 


# 
model.txt <- paste("Ilm(DLPT.", dlpt, " ~ ", first, " + ", second, 
", @ata = data, na.action = na.omit, subset = LANG == lang & 
'zeros)", sep = "") | 
out <- eval(parse(text = model.txt) ) # 
# 
# Set up the matrix of results. The first column is the x's. 
# 
result <- matrix(0, n, 2) 
result[{, 1] <- seq(0, 100, length = n) # 
# 
# Set up the x-label. "F10A" is a special case. These might be DLPT's, 
too, 
# for the GAFB case. 
# 
if(substring(first, 1, 1) == "F") { 
if(nchar(first) == 4) 
f.txt <- paste("Perf. Obj.", substring(first, 2, 3)) 
else £.txt <- paste("Perf. Obj.", substring(first, 2, 2)) 


} 
else f.txt <- first 
if(substring(second, 1, 1) = 
i£(nchar(second) == 4) 
s.txt <- paste("Perf. Obj.", substring(second, 2, 3)) 
else s.txt <- paste("Perf. Obj.", substring(second, 2, 2)) 


= sae { 


} 

else s.txt <- second 

for (iam den): 4 | 
cat("Finding frontier ", i, “\n") 
second.test <- seq(0, 100, length = n) 


qd 





pred.list <- predict(out, cbind(1, rep(result[i, 
second.test), se.fit = T) 

preds <- pred.listSfit 

sds <- sqrt(pred.listSresid*2 + pred.list$se%2) 

temp.res <- 1 = pnorm( (crit - preds) /sds) 

app.out <- approx(temp.res, second.test, 0.8) 

result[i, 2] <- app.outSy 


Draw the picture and quit. 


HE OE OE OH: 


1], 





Nn) 


plot(result[, 1], result[, 2], xlab = f.txt, ylab = s.txt, main 


paste("80% Frontier for", xref[xref[, 
"two"] == lang, "long"], "DLPT_", dlpt), type = 
G0 1:00) y yviam-= e(0,- 100) 3 
1f(return.model == T) 
return (out) | 
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' APPENDIX F. FRONTIER GRAPHS FOR DLIFLC MODELS WITH TWO 
MAIN EFFECTS 
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APPENDIX G. FRONTIER GRAPHS FOR GAFB BLOCK TESTS 
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